multi-agent domain
Towards Multi-Agent Reinforcement Learning using Quantum Boltzmann Machines
Müller, Tobias, Roch, Christoph, Schmid, Kyrill, Altmann, Philipp
Reinforcement learning has driven impressive advances in machine learning. Simultaneously, quantum-enhanced machine learning algorithms using quantum annealing underlie heavy developments. Recently, a multi-agent reinforcement learning (MARL) architecture combining both paradigms has been proposed. This novel algorithm, which utilizes Quantum Boltzmann Machines (QBMs) for Q-value approximation has outperformed regular deep reinforcement learning in terms of time-steps needed to converge. However, this algorithm was restricted to single-agent and small 2x2 multi-agent grid domains. In this work, we propose an extension to the original concept in order to solve more challenging problems. Similar to classic DQNs, we add an experience replay buffer and use different networks for approximating the target and policy values. The experimental results show that learning becomes more stable and enables agents to find optimal policies in grid-domains with higher complexity. Additionally, we assess how parameter sharing influences the agents behavior in multi-agent domains. Quantum sampling proves to be a promising method for reinforcement learning tasks, but is currently limited by the QPU size and therefore by the size of the input and Boltzmann machine.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Health & Medicine (0.68)
- Leisure & Entertainment > Games (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Experience Augmentation: Boosting and Accelerating Off-Policy Multi-Agent Reinforcement Learning
Ye, Zhenhui, Chen, Yining, Song, Guanghua, Yang, Bowei, Fan, Shen
Exploration of the high-dimensional state action space is one of the biggest challenges in Reinforcement Learning (RL), especially in multi-agent domain. We present a novel technique called Experience Augmentation, which enables a time-efficient and boosted learning based on a fast, fair and thorough exploration to the environment. It can be combined with arbitrary off-policy MARL algorithms and is applicable to either homogeneous or heterogeneous environments. We demonstrate our approach by combining it with MADDPG and verifing the performance in two homogeneous and one heterogeneous environments. In the best performing scenario, the MADDPG with experience augmentation reaches to the convergence reward of vanilla MADDPG with 1/4 realistic time, and its convergence beats the original model by a significant margin. Our ablation studies show that experience augmentation is a crucial ingredient which accelerates the training process and boosts the convergence.
An Action Language for Multi-Agent Domains: Foundations
Baral, Chitta, Gelfond, Gregory, Pontelli, Enrico, Son, Tran Cao
In multi-agent domains (MADs), an agent's action may not just change the world and the agent's knowledge and beliefs about the world, but also may change other agents' knowledge and beliefs about the world and their knowledge and beliefs about other agents' knowledge and beliefs about the world. The goals of an agent in a multi-agent world may involve manipulating the knowledge and beliefs of other agents' and again, not just their knowledge/belief about the world, but also their knowledge about other agents' knowledge about the world. Our goal is to present an action language (mA+) that has the necessary features to address the above aspects in representing and RAC in MADs. mA+ allows the representation of and reasoning about different types of actions that an agent can perform in a domain where many other agents might be present -- such as world-altering actions, sensing actions, and announcement/communication actions. It also allows the specification of agents' dynamic awareness of action occurrences which has future implications on what agents' know about the world and other agents' knowledge about the world. mA+ considers three different types of awareness: full-, partial- awareness, and complete oblivion of an action occurrence and its effects. This keeps the language simple, yet powerful enough to address a large variety of knowledge manipulation scenarios in MADs. The semantics of mA+ relies on the notion of state, which is described by a pointed Kripke model and is used to encode the agent's knowledge and the real state of the world. It is defined by a transition function that maps pairs of actions and states into sets of states. We illustrate properties of the action theories, including properties that guarantee finiteness of the set of initial states and their practical implementability. Finally, we relate mA+ to other related formalisms that contribute to RAC in MADs.
- Asia > Japan (0.04)
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- North America > Canada > British Columbia (0.04)
- (17 more...)
EFP and PG-EFP: Epistemic Forward Search Planners in Multi-Agent Domains
Le, Tiep (New Mexico State University) | Fabiano, Francesco (New Mexico State University) | Son, Tran Cao (New Mexico State University) | Pontelli, Enrico (New Mexico State University)
This paper presents two prototypical epistemic forward planners, called EFP and PG-EFP, for generating plans in multi-agent environments. These planners differ from recently developed epistemic planners in that they can deal with unlimited nested beliefs, common knowledge, and capable of generating plans with both knowledge and belief goals. EFP is simply a breadth first search planner while PG-EFP is a heuristic search based system. To generate heuristics in PG-EFP, the paper introduces the notion of an epistemic planning graph. The paper includes an evaluation of the planners using benchmarks collected from the literature and discusses the issues that affect their scalability and efficiency, thus identifies potentially directions for future work. It also includes experimental evaluation that proves the usefulness of epistemic planning graphs.
Reasoning with Doxastic Attitudes in Multi-Agent Domains
Wright, Ben (New Mexico State University) | Pontelli, Enrico (New Mexico State University)
In recent years, we have witnessed a blossoming of research proposals addressing thechallenges in reasoning about action and change in domains that include an agent operatingin a multi-agent setting. In particular, the recent emphasis has been on dealing with domains that involve agents reasoning not only about the state of the world but also about the knowledge andbeliefs of other agents. An open challenge is the management of conflicting and incorrectbeliefs. This paper seeks to introduce a solution to this through the use of doxastic attitudes. Built on top of the action language mA+, we extend the transition functions of an agent to include this idea of attitudes and showcase how these work in two different examples.
Reasonableness Monitors
As we move towards autonomous machines responsible for making decisions previously entrusted to humans, there is an immediate need for machines to be able to explain their behavior and defend the reasonableness of their actions. To implement this vision, each part of a machine should be aware of the behavior of the other parts that they cooperate with. Each part must be able to explain the observed behavior of those neighbors in the context of the shared goal for the local community. If such an explanation cannot be made, it is evidence that either a part has failed (or was subverted) or the communication has failed. The development of reasonableness monitors is work towards generalizing that vision, with the intention of developing a system-construction methodology that enhances both robustness and security, at runtime (not static compile time), by dynamic checking and explaining of the behaviors of parts and subsystems for reasonableness in context.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
- North America > United States > District of Columbia > Washington (0.05)